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Abstract 

With standard algorithms for generating the classical Kolakoski 
sequence, the numerical calculation of the digit distribution requires a 
linear amount of space. Here, we present an algorithm for calculating 
the distribution of the digits in the classical Kolakoski sequence, that 
only requires a logarithmic amount of space and still runs in linear time. 
The algorithm is easily adaptable to generalised Kolakoski sequences. 

1 Introduction 

The classical Kolakoski sequence K = (i^n)5^Li is the unique sequence on 
the alphabet {1,2} defined as the sequence of its own symbols' run lengths 
starting with a 1. The classical Kolakoski sequence is given in [7, 8], and is 
in the On-Line Encyclopedia of Integer Sequences [13] with entry number 
A000002 . The first letters of K are 

K = 1 2 2 11 2 1 2 2... 

\ /\ As \ \ A. \ As As (1) 
E:=12211212212211... 

There are several interesting questions, answered and unanswered, on the 
properties of the classical Kolakoski sequence; Kimberling presents several 
of these in [6]. One of the simplest, and yet unresolved, questions is that of 
the distribution of digits in K. If we let o„, be the number of Is in K up to 
and including position n, that is On = \{i '■ Ki = 1,1 < i < n}\, then the 
conjecture is 

Conjecture 1. The limit lim„_j.oo ^ exists and equals ^. 

Both parts of Conjecture 1, the existence and the value, are still open. 
Several aspects of the conjecture (along with other properties and questions 
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regarding the Kolakoski sequence as well) are considered by Dekking in 
[3, 4, 5]; see also the survey by Sing [12] and further references therein. 

In [14] Steinsky describes a recursion that generates the letters Kn and 
uses it to numerically calculate the distribution of the Is up to n = 3-10^. It 
is worth noting that a straight-forward implementation of Steinsky's recur- 
sion leads to an algorithm that either runs in exponential time or requires a 
linear amount of space. For some time, Steinsky's result raised doubt as to 
the validity of Conjecture 1, however subsequent work by Monteil [9] sugges- 
ted once again that the conjecture should hold. Monteil used a brute force 
method, requiring linear time and linear space in n, to push the calculation 
to n = 10^^ 

The brute force, or straight-forward, method to find o„ generates a prefix 
of length n of the sequence K, using the intuitive method suggested by (1). 
That is, starting from a suitable initial sequence, we step through and read 
off the symbols one by one, with each letter telling us what to write in 
the sequence beneath, and thus what to append to the end of the current 
sequence. 

We present here an algorithm which runs in linear time, yet only requires 
a logarithmic amount of space to find On- Using our algorithm, we can 
easily push the calculation further than the calculation made by Monteil; we 
present here values of On up to n = 10^^ (Table 1). Our calculation indicates 
that Conjecture 1 should hold, but once again gives no definite answer. We 
present our algorithm in Section 2 and state and prove the algorithm's run 
time performance in Section 3. In Section 4, we briefiy remark on our 
algorithm's adaptability to more general Kolakoski sequences, and finally in 
Section 5 we present the results of our calculations. 

2 The Algorithm 

We present here an algorithm for calculating the number of Is and 2s in 
the classical Kolakoski sequence if up to a position n. Our algorithm is 
more memory-efficient than the straight-forward algorithm for finding K^, 
it requires only O(logn) amount of space (Proposition 4) compared to the 
0{n) for a brute force algorithm. Here we use the standard asymptotic 
notation 0{n). That is, we write /(n) = 0{g{n)) if there is a constant c 
such that /(n) < cg{n) for all n. (For more of this see [2].) The run time 
of our algorithm is 0(n) to find On (Proposition 5); this is the same as for 
the brute force method. 

The idea in our algorithm is that if we set out only to find o„, we do not 
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Figure 1: The tree structure in the Kolakoski sequence. 



have to save the complete sequence up to position n when stepping through 
the sequence K. As in the intuitive way of generating we look back 
at a previous position to see which symbol run to append. However, this 
previous position is itself determined by a letter even further back, and so 
on. If we keep track only of these positions that we "look back at" , we can 
drastically reduce the amount of space needed by the algorithm. 

To get a hint of how this can be done, we take as a starting point a 
scheme, as in (1). We see that the upper row defines (or conversely, may 
be defined as) the run lengths of the symbols in the lower one. We expand 
this scheme by adding more rows above and connecting each symbol to the 
symbol in the row above that has (via run length) generated it. In this way, 
we obtain a tree structure, as illustrated in Figure 1. 

We may thus interpret the letters in the classical Kolakoski sequence 
K as the leaves of a tree, (the leaves are the symbols in the bottom row in 
Figure 1). Each internal node in this tree structure is a symbol in in an upper 
row interpreted as a run length. Each letter is connected to the letter above 
that has generated it (called an ancestor), and also to the letter (s) below 
that it generates, termed children. This tree structure continues upwards 
without bound as we step through the symbols of the Kolakoski sequence. 
However, we only need to go up in the tree until we find an ancestor, to the 
leaf we are currently looking at, at a left most position. 

From this point on we shall consider the sequence K' , defined hy K = 
IK'. This simplifies matters somewhat, as we do not then have to deal with 
the left most Is at each height in the tree. 

The algorithm for finding On can concisely be described as an "in-order 
traverse" of this tree structure, where we start from the lower left, and 
where we keep track of the symbols we see in the leaves during the traverse. 
While traversing, we add new ancestors when needed; that is we build the 
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tree as we traverse it. To reduce the memory requirement, we dynamically 
generate and keep track only of the part of the tree that we currently use 
for the traverse. While doing so, we store the ancestors along with an 
indicator that tells us which of its children we have already traversed. To 
this end, we introduce pointers P^, which are assigned values from the set 
S = {1,2, 11, 22}. Note that here, a run is defined as word from the set S. 
At any given time, the pointer Pq holds the current run in the leaves and 
Pi holds the ancestor to Pq. Similarly, any Pk that has been initiated holds 
the ancestor to Pk-i- 

We say here that pointers "hold" and not "are" a run because P^ may 
contain more than just the single-symbol ancestor of Pk-i, it may also con- 
tain a sibling of P^. Here we refer to the single symbols (that is, Is or 2s) 
of a two symbol run (11 or 22) as siblings. 

The algorithm can now be described as follows. 

Algorithm 2. 

- To increase (or to assign a new value to) the pointer P^ we proceed 
as follows. Firstly, if Pk has not been initiated, let Pk = 22. If Pk, for 
A: > 0, contains two symbols then remove one of the symbols in Pk] 
otherwise (if /c = 0), increase Pi. 

If, on the other hand, Pk contains only one symbol, then increase Pfc+i 
recursively. When this increment is done, the new run to write in Pk 
is of the length given by the first symbol in P^+i and the run to write 
has symbol(s) opposite to the symbol(s) previously held by Pk- Note 
that here we do not remove the first symbol of Pfc+i when we return 
from the recursion. 

- To step throw the sequence K (from its second symbol onwards) and 
calculate o^, we repeatedly increment the pointer Pq and keep track 
of the number of Is and 2s that we see. o 

Note that for a given run contained in Pq, the algorithm will generate 
only the pointers Pi,...,Pj\[ to Pq, where the ancestor in Pjy is at the 
left most position in the sequence K'. (And it is this height that we 
shall shortly show is of the order of logn when Pq holds the nth letter 
in the sequence). As we step through the algorithm, we shall see that the 
successive runs held by the pointer Pq (and also for other Pk) are the symbols 
in the sequence K'. nl pseudo-code the increment of Pq, (or the step by step 
traverse of the leaves) , would be done with the recursive call of the procedure 
IncrementPointer as presented below. 
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// Increments the pointer at height n. 

// After initiating P [0] succesive calls to IncrementPointer (0) 
// will yield the Kolakoski sequence from the second term onward. 

IncrementPointer (int k) 

{ if(P[k] has not been initiated) 

{ P[k] = 22 

> 

if (k == 0) 

{ IncrementPointer (1) 

if (P[0] == 1 or P[0] == 11) 

{ P[0] = (P[l] == 1) ? 2 : 22 

}else 

{ P[0] = (P[l] == 1) ? 1 : 11 
} 

>else if(P[k] == 1) 

{ IncrementPointer (k+1) 

P[k] = (P[k+1] == 1 or P[k+1] == 11) ? 2 : 22 
}else if(P[k] == 2) 
{ IncrementPointer (k+1) 

P[k] = (P[k+1] == 1 or P[k+1] == 11) ? 1 : 11 
}else if(P[k] == 11) 
{ P[k] = 1 
}else 

{ P[k] = 2 
} 

} 

To illustrate how the algorithm works, we now present through of its 
initial steps. 

Example 3. Incrementing the pointer Pq once is done through the following 
procedure; 



:2]:Pi 



K': [2Z2]:Po 
(a) 



K' : 



/\ 

(b) 



2 [2]: Pi 

/\ /\ 

2 2 \LJi-Po 
(c) 



Figure 2: The first increment of the pointer Pq. 

Figure 2 illustrates the first increment of the pointer Pq in the algorithm, 
(a) The initiation of Pq. The framed symbols 22 are the contents of the 
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pointer Pq. (b) To continue our leaf traverse we must generate the next 
leaf. This is done by looking at the ancestor of the run held by Pq . As this 
ancestor does not exist we have to generate it, that is we set Pi = 22. (c) 
The first symbol of Pi already has children (that is, it generated the initial 
run held by Pq). Therefore we step to the second symbol of Pi. The new 
run to assign to Pq (that is, the new leaf we traverse) is then 11, since the 
current symbol in Pi is 2 and Pq currently holds the run 22. 



2 [2]: Pi 
/\ /\ 

2 2 frU: Pn 



_2J:P2 



K' : 



(a) 
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/\ /\ 
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/\ /\ 
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2 2 1 1 [1 : Po 

(c) 



Figure 3: The second increment of the pointer Pq. 



Figure 3 illustrates the second increment of the pointer Pq in the al- 
gorithm, (a) To generate the next leaf we have to look at the ancestor of 
the run currently held by Pq. That is, we look at the pointer Pi. But since 
we have already used the symbol in Pi we have to recursively look at the 
ancestor of Pi. This does not exist, so we initiate the ancestor and pointer 
P2 = 22. (b) As the first symbol of P2 already has children, we step to its 
second symbol. The new run to assign to Pi is then 11, since the relevant 
ancestor in P2 is 2 and Pi currently holds the run 22. (c) We have not yet 
generated any of the children of any of the symbols held by Pi and therefore 
the current one is the first one. This provides the new run of 2 in Pq, since 
the first symbol in Pi is 1 and Pq currently holds 11. 



/\ 



2 2 ll ll : Pi 

/\ /\ I 

K' : 2 2 1 1 [U : Pq 

(a) 



2 [2]: Pa 

2 2 1 [j : Pi 

/\ /\ I I 

K' : 2 2 1 1 2 [T] : Po 

(b) 



Figure 4: The third increment of the pointer Pq. 



Figure 4 illustrates the third increment of the pointer Pq in the algorithm, 
(a) The status of the pointers after the second increment of Pq. Note that 
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we have only used the first symbol held by Pi. (b) To generate the next leaf 
we look at the ancestor of the run currently held by Pq, that is Pi, which 
contains the run 11. The first symbol already has a child, so we use the 
second symbol, 1, to generate the new run in Pq, which is 2, as Pq currently 
holds the run 1. 




0:^3 



2 2 il 
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Figure 5: The first part of the fourth increment of the pointer Pq. 



Figure 5 illustrates the first part of the fourth increment of the pointer 
Pq in the algorithm, (a) To increase Pq we have to look at the ancestors 
of the run held by Pq. We see that we have used all symbols in all of the 
ancestors, therefore we have to initiate the new pointer P3 = 22. (b) We 
have already used the first symbol held by P3 and therefore we step to its 
second symbol. The new run to assign to P2 is now 11 since P3 = 2 and 
P2 = 22. 




K' : 2 2 1 1 2 [1] : Po 



K' 




Figure 6: The second part of the fourth increment of the pointer Pq. 



Figure 6 illustrates the second part of the fourth increment of the pointer 
Pq in the algorithm, (a) We have not yet used any of the symbols held by 
P2 and therefore the current one is the first. Then the new symbol in Pi is 
2 since the current symbol in P2 is 1 and Pi = 1 . (b) The new run to assign 
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to Pq is now 22 since the first symbol in Pi is 2 and Pq = 1. 



□ 



Note that the algorithm does not 
that it steps through. The algorithm 
of the pointers P^ and how many of 



need to keep track of the tree structure 
only keeps track of the current contents 
each symbol we have seen in Pq. 



3 Run Time Analysis of the Algorithm 

Let tn be the number of 2s in K up to and including position n. That is 
tn = lii '■ Ki = 1,1 < i < n}\. Recall that we have already similarly defined 
On as the number of ones. By considering words of the form 

11211 and 22122 

we see that we have the bounds 

i < ^ < 4 (2) 

for n > 2. For the analysis, let P{n) be the number of pointers used by 
Algorithm 2 to calculate On- 

Proposition 4. The amount of space used by Algorithm 2 to find On is 
logarithmic in n. That is, P{n) = O(logn). 

Proof. Let wq = 122 and wi = 12211 and similarly let be the run length 
sequence defining w^+i- Then is a prefix of the sequence K for all /c > 0. 
(The collection of the words Wk is known as the Kolakoski fan.) By the 
frequency bound (2) it follows that 

6 ^ \wk+i\ ^ 9 
5 ~ \wk\ ~ 5 

whenever k>l and where | • | denotes the length of a word. 

This implies that if pointer Pq holds the symbol at position n in K' 
then the pointer Pi is at most at position |n and at least at position |n in 
the Kolakoski sequence. This argument can now be applied to all pointers. 
Therefore we see that we have a bound on the number of pointers 



P{n) < 



log n 



O(logn), 



log- 

which completes the proof. □ 
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If Conjecture 1 were shown to be true, it would follow that the number 
of pointers needed to find o„ is P{n) ^ r^~T logn. 

log 2 

Proposition 5. The Algorithm 2 runs in (amortized) linear time. That is, 
to find On we have to do an amount of work of order 0{n). 

Proof. Let us consider the maximal amount of work we have to do to make 
n increments of the pointer Pq (to generate n runs). Note that making n 
increments of Pq will actually be enough to find at least oe^, since in each 

5 

step we generate a run of one or two symbols. Hence, as we seek a maximum, 
and including the factor | would decrease the calculated amount of work 
by a constant factor, we may simplify our calculation by disregarding this 
factor. 

Let Pk{n) be the number of times we change the contents of pointer P^, 
under these n increments. Then the sum of the p^s will give us the total 
amount of work we have to do. It is clear that po{n) = n, since we change Pq 
at each increment, and from the algorithm we see directly that pi{n) = n. 
The other pointers do not change every time; for k > 2 we make a change 
to Pk only when P^-i consists of a single symbol. 

Let akin) be the number of times the pointer Pk holds a single symbol 
under n increments of Pq. Similarly let be the number of times that 

Pk holds two symbols under the n increments of Pq. From the algorithm 
we see that to find the maximal amount of work, we have to look for the 
maximal number of single-symbol pointer contents, since this is what forces 
us to go recursively higher in the tree. For the pointer Pq it follows from (2) 
that we have the bounds 

1<^<4 
4 6o(ra) 

for n > 1. For pointers higher up, we have that the number of times P^ 
holds a single symbol is at most four times the number of times it holds two 
symbols plus the number of times it holds two symbols, since in the latter 
case Pk will hold a single symbol in the next step of the algorithm. This 
gives 

ak{n) < 4&fc(n) + 6fc(n) = 56„(/c) 

Therefore, our upper bound on the number of times a pointer holds a single 
symbol gives the bound on the amount of work we have to do with a pointer 
Pk compared to the amount of work for the pointer holding the children of 
Pk. This is 

5 

Pk+iin) < -pk{n) (3) 
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for k > 1. The total amount of work we now have to do to increment the 
pointer Pq n times is therefore bounded by the initial amount of work plus 
the convergent geometric series obtained from (3) We have 



oo 



EU) <(7 + f^K (4) 




P{n) = 2_^Pi{^) < + clogn + n 
1=0 



i=0 



where clogn is the initial amount of work for each pointer before we can 



4 Generalised Kolakoski Sequences 

In this this section we remark that our algorithm is also apphcable to a 
general Kolakoski sequence. By a generalised Kolakoski sequence we mean 
a sequence that is defined as its symbols' run length, as for the classical 
Kolakoski sequence, but the symbols may be taken from any alphabet {r, s}, 
where r and s are natural numbers, as discussed in [4]. We denote a gener- 
alised Kolakoski sequence over r and s with K{r, s) and shall assume that 
K{r, s) starts with the symbol r. The classical Kolakoski sequence is then 



It is known that if r + s is an even number, then the letter frequency 
in K{r,s) can be calculated; see [1, 10, 11, 12]. When r + s is odd, the 
existence and the value of the letter frequencies are still unknown, but are 
believed to exist and equal ^. 

Our algorithm easily adopts to count the letters in a generalised Kola- 
koski sequence; we may only have to change the initiation of new pointers. 
By applying the same idea as in the proof of Proposition 4 we see that 
the algorithm in this case with a generalised Kolakoski sequence uses fewer 
pointers than for the classical Kolakoski sequence, and therefore the space 
requirement must again be at most logarithmic. 

Similarly, by looking at the proof of Proposition 5 we see that the number 
of times we use a pointer for a general Kolakoski sequence before having to 
consider its ancestor is longer than for the classical Kolakoski sequence. 
Therefore the bounding factor for the quoted amount of work between two 
consecutive levels (3) must be smaller than the | given for the classical 
Kolakoski sequence. This gives then, by summing up as in (4), that the 
total amount of work for the generalised Kolakoski sequence is also linear in 
n. 



apply our estimates above. 



□ 



K = K{1,2). 
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5 Calculations 



In Table 1 we present a short output from an implementation in Java of 
our Algorithm 2 for calculating the number of Is in the classical Kolakoski 
sequence. The program was run on a standard PC. In Table 2 we present 
results of a calculation of the number of 2s in the generalised Kolakoski 
sequence K{2,3), the sequence A071820 in the On-Line Encyclopedia of 
Integer Sequences [13]. 

We denote for the classical Kolakoski sequence the maximal deviation of 
the proportion of Is from ^ in a logarithmic decade by 

D{n) = max 

jQn<i<n 

where Oj is the number of Is up to position i. We can similarly define the 
deviation for the generalised Kolakoski sequence K{2,3). 
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